An R Package to Read CCCCO MIS Files
Christian Million
Data Analyst
Yosemite Community College District
comis?An internally developed R package
Read and Format:
MIS Submission Files
MIS Referential Files
Every term, someone at your college converts SIS data into .DAT files, using the file specs found in the Data Element Dictionary.
These are submission files.
After submission, colleges request referential files from the CCCCO.
These contain elements derived from submission files, explicit formatting, and additional student information.
We want to analyze it!
Monitoring Student Success and Equity
Accountability
Categorical Funding (EOPS, DSPS, Perkins, …)
Student Centered Funding Formula
Research
…
~ 25 files | 396 elements
Fixed Width Format
No Column Names
Numbers that should be characters / dates
Missing values (NA)
Trailing white space
Implied decimal points
~ 27 files | 406 elements
Tab Delimited :)
No Column Names
Numbers that should be characters / dates
Missing values (NA)
Trailing white space
Implied decimal points
Different date format than submission file.
A lot to re-remember
Cognitively taxing to implement
Takes time
Updates to multiple scripts
Copy / paste errors
Makes scripts more difficult to read
Unfulfilling
Lots of overhead before analysis can begin
comislibrary(dplyr)
library(readr)
CB_col_names <- c('GI90', 'GI01','GI03', paste0("CB0",0:9), paste0("CB",10:27), "Filler")
CB_col_types <- rep("c", length(CB_col_names))
CB_col_width <- CB <- c(2,3,3,12,12,68,6,1,1,length(109:112),length(113:116),1,1,1,1,1,1,6,8,length(137:148),length(149:160),length(161:172),7,9,1,1,1,1,1,1,1,26)
XB_col_names <- c('GI90', 'GI01', 'GI03', 'GI02', 'CB01', paste0('XB0',0:9), 'XB10', 'XB11', 'XB12', 'CB00', 'Filler')
XB_col_types <- rep("c", length(XB_col_names))
XB_col_width <- c(2,3,3,3,12,6,1,6,6,1,length(44:47), length(48:51),1,1,1,1,length(56:61), 1, 12,7)
CB_src <- readr::read_tsv("path/to/U59223CB.dat",
col_names = CB_col_names,
col_types = CB_col_types,
trim_ws = TRUE)
XB_src <- readr::read_tsv("path/to/U59223XB.dat",
col_names = CB_col_names, # copy / paste errors
col_types = XB_col_types,
trim_ws = TRUE)
CB <- CB_src |>
mutate(dates = date_cleaning_code(),
units = implicit_decimal_code())
XB <- XB_src |>
mutate(dates = date_cleaning_code(),
units = implicit_decimal_code())comisContains useful data found on CCCCO websites
Read many files at once
Read from repo
Use DED Name or Descriptive Name
comisEasier to tell what’s happening
Reduces cognitive overhead
Get to analysis faster and with more confidence
Documentation contained within the package
Updates made in one spot (instead of throughout various scripts)
Shifts focus to what’s important - Using the Data
Addresses problems specific to the institution
Reasonable defaults
Abstracts common tasks
Maintainable
Share code with others
Documentation / Vignettes
DisImpact (“internal” to CCCCO)
yccdDB (creates and manages DB connections / queries)
hub (.Rmd/.Qmd storage and usage monitoring)
Reading/Process other CCC Files: SEA, VFS, SCFF, etc…
yccdTemplates (project / analysis / report templates)
yccdThemes (branding graphs / reports)
yccdTerms (help with term math / formatting)
Christian Million
Data Analyst
Yosemite Community College District
Pier to Pier | 2022-08-25